Path-Based Spectral Clustering: Guarantees, Robustness to Outliers, and Fast Algorithms

Anna Little, Department of Computational Mathematics Science and Engineering, Michigan State University, USA

This talk will discuss new performance guarantees for robust path-based spectral clustering and an efficient approximation algorithm for the longest leg path distance (LLPD) metric, which is based on a sequence of multiscale adjacency graphs. LLPD-based clustering is informative for highly elongated and irregularly shaped clusters, and we prove finite-sample guarantees on its performance when random samples are drawn from multiple intrinsically low-dimensional clusters in high-dimensional space, in the presence of a large number of high-dimensional outliers. More specifically, we derive a condition under which the Laplacian eigengap statistic correctly determines the number of clusters for a large class of data sets, and prove guarantees on the number of points mislabeled by our method. Our methods are quite general and provide performance guarantees for spectral clustering with any ultrametric. We also propose a fast algorithm for implementing path-based spectral clustering which has complexity quasilinear in the number of data points.